perm filename RECOG.LET[1,JMC] blob
sn#005233 filedate 1971-10-15 generic text, type T, neo UTF8
00010 Dr. Lawrence Roberts
00020 Advanced Research Projects Agency
00030 Alexandria, Virginia
00040
00050
00100 Dear Larry:
00200
00300 I think ARPA should look into the possibilities of making use
00400 of the efforts that Information International has put into character
00500 recognition. As I see it, the situation is as follows:
00600
00700 1. The character recognition problem has not been solved in
00800 general even though readers exist for special fonts. This is the
00900 claim of Dan Forsyth and others at III. I don't know that it is so,
01000 but I believe it, and I think it should be looked into.
01100
01200 2. File sizes are getting to the point where it is becoming
01300 feasible to put the world's literature into computer files, and it is
01400 worthwhile to do this large one shot task. I believe that the
01500 Defense Department will find that it has enough literature of its own
01600 to put into computer files to justify an effort to do so.
01700
01800 3. At its own expense III has developed a system called
01900 GRAFIX 1 for character reading. The system consists of a PDP-10, a
02000 special Binary Image Processor (BIP), and a lot of software. At
02100 present, they can do a number of moderately impressive demonstrations
02200 of character reading.
02300
02400 4. III's original objective in developing GRAFIX 1 was to
02500 make a lot of money by selling them. At present, they seem rather
02600 discouraged about this partly because of the general state of the
02700 economy, and, I would guess, partly because they don't have as good
02800 an idea salesman as Fredkin was. The company won't go broke if they
02900 scrap GRAFIX, because they have quite a bit of cash, and the FR80
03000 computer output microfilm system seems to be a successful product.
03100
03200 5. Nevertheless, there seems to me to be a substantial
03300 probability that the effort so far put into character recognition in
03400 this project will be lost just as the possibility of using it on a
03500 large scale to get the scientific literature into computer files is
03600 becoming a real possibility.
03700
03800 6. My former student Takasayu Ito, who now heads a group in
03900 Mitsubishi is talking to III about buying the whole project at what
04000 amounts to used computer prices, i.e. 500K. Fenaughty may be
04100 inclined to sell it to him although he is looking into the
04200 possibility of finding a Japanese buyer at a higher price.
04300
04400 There are several things ARPA could do about the situation:
04500
04600 1. The most straightforward option is to give III a research
04700 contract at about $220K per year which would pay for the present
04800 level of effort without amortizing any of their expenses. When the
04900
05000 2. An ARPA contractor could be encouraged to buy a machine or
05100 buy out the project. This would cost more since the machine contains
05200 a PDP-10 and would probably result in dissipating the present
05300 research group. It also offers the difficulty that none of the
05400 present ARPA contractors are in that line of research.
05500
05600 3. Arrange for them to get production contracts for putting
05700 documents into computers of sufficient size to keep them going.
05800 There will be more motivation to do this after the terabit file and
05900 some of its brothers are working, but work done in this direction
06000 will not be lost if the material converted has permanent value.
06100
06200 The first alternative seems to me to be the best, but the
06300 others are tolerable. If the second is chosen, it would be better if
06400 the contractor in question were someone other than Stanford since I
06500 am on the board of directors of III.
06600
06700
06800 Sincerely yours,
06900
07000
07100 John McCarthy